Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sci Rep ; 13(1): 959, 2023 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-36653463

RESUMO

Past research in computational systems biology has focused more on the development and applications of advanced statistical and numerical optimization techniques and much less on understanding the geometry of the biological space. By representing biological entities as points in a low dimensional Euclidean space, state-of-the-art methods for drug-target interaction (DTI) prediction implicitly assume the flat geometry of the biological space. In contrast, recent theoretical studies suggest that biological systems exhibit tree-like topology with a high degree of clustering. As a consequence, embedding a biological system in a flat space leads to distortion of distances between biological objects. Here, we present a novel matrix factorization methodology for drug-target interaction prediction that uses hyperbolic space as the latent biological space. When benchmarked against classical, Euclidean methods, hyperbolic matrix factorization exhibits superior accuracy while lowering embedding dimension by an order of magnitude. We see this as additional evidence that the hyperbolic geometry underpins large biological networks.


Assuntos
Desenvolvimento de Medicamentos , Modelos Teóricos , Matemática , Biologia de Sistemas , Sistemas de Liberação de Medicamentos
2.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 2377-2384, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-33591920

RESUMO

Modeling complex biological systems is necessary to understand biochemical interactions behind pharmacological effects of drugs. Successful in silico drug repurposing relies on exploration of diverse biochemical concepts and their relationships, including drug's adverse reactions, drug targets, disease symptoms, as well as disease associated genes and their pathways, to name a few. We present a computational method for inferring drug-disease associations from complex but incomplete and biased biological networks. Our method employs matrix completion to overcome the sparseness of biomedical data and to enrich the set of relationships between different biomedical entities. We present a strategy for identifying network paths supportive of drug efficacy as well as a computational procedure capable of combining different network patterns to better distinguish treatments from non-treatments. The algorithms is available at http://bioinfo.cs.uni.edu/AEONET.html.


Assuntos
Algoritmos , Reposicionamento de Medicamentos , Biologia Computacional/métodos , Reposicionamento de Medicamentos/métodos
3.
Sci Rep ; 9(1): 20025, 2019 12 27.
Artigo em Inglês | MEDLINE | ID: mdl-31882773

RESUMO

Due to the aging world population and increasing trend in clinical practice to treat patients with multiple drugs, adverse events (AEs) are becoming a major challenge in drug discovery and public health. In particular, identifying AEs caused by drug combinations remains a challenging task. Clinical trials typically focus on individual drugs rather than drug combinations and animal models are unreliable. An added difficulty is the combinatorial explosion in the number of possible combinations that can be made using the increasingly large set of FDA approved chemicals. We present a statistical and computational technique for identifying AEs caused by two-drug combinations. Taking advantage of the large and increasing data deposited in FDA's postmarketing reports, we demonstrate that the task of predicting AEs for 2-drug combinations is amenable to the Likelihood Ratio Test (LRT). Our pAERS database constructed with LRT contains almost 77 thousand associations between pairs of drugs and corresponding AEs caused solely by drug-drug interactions (DDIs). The DDIs stored in pAERS complement the existing data sets. Due to our stringent statistical test, we expect many of the associations in pAERS to be unrecorded or poorly documented in the literature.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos , Bases de Dados Factuais , Combinação de Medicamentos , Humanos , Estados Unidos
4.
Front Genet ; 10: 1381, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32063919

RESUMO

Advances in next-generation sequencing and high-throughput techniques have enabled the generation of vast amounts of diverse omics data. These big data provide an unprecedented opportunity in biology, but impose great challenges in data integration, data mining, and knowledge discovery due to the complexity, heterogeneity, dynamics, uncertainty, and high-dimensionality inherited in the omics data. Network has been widely used to represent relations between entities in biological system, such as protein-protein interaction, gene regulation, and brain connectivity (i.e. network construction) as well as to infer novel relations given a reconstructed network (aka link prediction). Particularly, heterogeneous multi-layered network (HMLN) has proven successful in integrating diverse biological data for the representation of the hierarchy of biological system. The HMLN provides unparalleled opportunities but imposes new computational challenges on establishing causal genotype-phenotype associations and understanding environmental impact on organisms. In this review, we focus on the recent advances in developing novel computational methods for the inference of novel biological relations from the HMLN. We first discuss the properties of biological HMLN. Then we survey four categories of state-of-the-art methods (matrix factorization, random walk, knowledge graph, and deep learning). Thirdly, we demonstrate their applications to omics data integration and analysis. Finally, we outline strategies for future directions in the development of new HMLN models.

5.
AMIA Jt Summits Transl Sci Proc ; 2017: 132-141, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29888057

RESUMO

Side effects are the second and the fourth leading causes of drug attrition and death in the US. Thus, accurate prediction of side effects and understanding their mechanism of action will significantly impact drug discovery and clinical practice. Here, we show REMAP, a neighborhood-regularized weighted and imputed one-class collaborative filtering algorithm, is effective in predicting drug-side effect associations from a drug-side effect association network, and significantly outperforms the state-of-the-art multi-target learning algorithm for predicting rare side effects. We also apply FASCINATE, an extension of REMAP for multi-layered networks, to infer associations among side effects and drug targets from drug-target-side effect networks. Then, using random permutation analysis and gene overrepresentation tests, we infer statistically significant side effect-pathway associations. The predicted drug-side effect associations and side effect-causing pathways are consistent with clinical evidences. We expect more novel drug-side effect associations and side effect-causing pathways to be identified when applying REMAP and FASCINATE to large-scale chemical-gene-side effect networks.

6.
Bioinformatics ; 34(16): 2835-2842, 2018 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-29617731

RESUMO

Motivation: Adverse drug reactions (ADRs) are one of the main causes of death and a major financial burden on the world's economy. Due to the limitations of the animal model, computational prediction of serious and rare ADRs is invaluable. However, current state-of-the-art computational methods do not yield significantly better predictions of rare ADRs than random guessing. Results: We present a novel method, based on the theory of 'compressed sensing' (CS), which can accurately predict serious side-effects of candidate and market drugs. Not only is our method able to infer new chemical-ADR associations using existing noisy, biased and incomplete databases, but our data also demonstrate that the accuracy of CS in predicting a serious ADR for a candidate drug increases with increasing knowledge of other ADRs associated with the drug. In practice, this means that as the candidate drug moves up the different stages of clinical trials, the prediction accuracy of our method will increase accordingly. Availability and implementation: The program is available at https://github.com/poleksic/side-effects. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Bases de Dados Factuais , Descoberta de Drogas
7.
Artigo em Inglês | MEDLINE | ID: mdl-30197820

RESUMO

Adverse drug reactions (ADRs) represent one of the main health and economic problems in the world. With increasing data on ADRs, there is an increased need for software tools capable of organizing and storing the information on drug-ADR associations in a form that is easy to use and understand. Here we present a step by step computational procedure capable of extracting drug-ADR frequency data from the large collection of patient safety reports stored in the Federal Drug Administration database. Our procedure is the first of its type capable of generating population specific drug-ADR frequencies. The drug-ADR data generated by our method can be made specific to a single patient population group (such as gender or age) or a single therapy characteristic (such as drug dosage, duration of therapy) or any combination of such.

8.
Sci Rep ; 6: 38860, 2016 12 13.
Artigo em Inglês | MEDLINE | ID: mdl-27958331

RESUMO

Conventional one-drug-one-gene approach has been of limited success in modern drug discovery. Polypharmacology, which focuses on searching for multi-targeted drugs to perturb disease-causing networks instead of designing selective ligands to target individual proteins, has emerged as a new drug discovery paradigm. Although many methods for single-target virtual screening have been developed to improve the efficiency of drug discovery, few of these algorithms are designed for polypharmacology. Here, we present a novel theoretical framework and a corresponding algorithm for genome-scale multi-target virtual screening based on the one-class collaborative filtering technique. Our method overcomes the sparseness of the protein-chemical interaction data by means of interaction matrix weighting and dual regularization from both chemicals and proteins. While the statistical foundation behind our method is general enough to encompass genome-wide drug off-target prediction, the program is specifically tailored to find protein targets for new chemicals with little to no available interaction data. We extensively evaluate our method using a number of the most widely accepted gene-specific and cross-gene family benchmarks and demonstrate that our method outperforms other state-of-the-art algorithms for predicting the interaction of new chemicals with multiple proteins. Thus, the proposed algorithm may provide a powerful tool for multi-target drug design.


Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Polifarmacologia , Algoritmos , Genoma , Reprodutibilidade dos Testes
9.
PLoS Comput Biol ; 12(10): e1005135, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-27716836

RESUMO

Target-based screening is one of the major approaches in drug discovery. Besides the intended target, unexpected drug off-target interactions often occur, and many of them have not been recognized and characterized. The off-target interactions can be responsible for either therapeutic or side effects. Thus, identifying the genome-wide off-targets of lead compounds or existing drugs will be critical for designing effective and safe drugs, and providing new opportunities for drug repurposing. Although many computational methods have been developed to predict drug-target interactions, they are either less accurate than the one that we are proposing here or computationally too intensive, thereby limiting their capability for large-scale off-target identification. In addition, the performances of most machine learning based algorithms have been mainly evaluated to predict off-target interactions in the same gene family for hundreds of chemicals. It is not clear how these algorithms perform in terms of detecting off-targets across gene families on a proteome scale. Here, we are presenting a fast and accurate off-target prediction method, REMAP, which is based on a dual regularized one-class collaborative filtering algorithm, to explore continuous chemical space, protein space, and their interactome on a large scale. When tested in a reliable, extensive, and cross-gene family benchmark, REMAP outperforms the state-of-the-art methods. Furthermore, REMAP is highly scalable. It can screen a dataset of 200 thousands chemicals against 20 thousands proteins within 2 hours. Using the reconstructed genome-wide target profile as the fingerprint of a chemical compound, we predicted that seven FDA-approved drugs can be repurposed as novel anti-cancer therapies. The anti-cancer activity of six of them is supported by experimental evidences. Thus, REMAP is a valuable addition to the existing in silico toolbox for drug target identification, drug repurposing, phenotypic screening, and side effect prediction. The software and benchmark are available at https://github.com/hansaimlim/REMAP.


Assuntos
Antineoplásicos/química , Avaliação Pré-Clínica de Medicamentos/métodos , Reposicionamento de Medicamentos/métodos , Ensaios de Triagem em Larga Escala/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Terapia de Alvo Molecular/métodos , Ligação Proteica
10.
Algorithms Mol Biol ; 10: 27, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26504491

RESUMO

BACKGROUND: Progress in the field of protein three-dimensional structure prediction depends on the development of new and improved algorithms for measuring the quality of protein models. Perhaps the best descriptor of the quality of a protein model is the GDT function that maps each distance cutoff θ to the number of atoms in the protein model that can be fit under the distance θ from the corresponding atoms in the experimentally determined structure. It has long been known that the area under the graph of this function (GDT_A) can serve as a reliable, single numerical measure of the model quality. Unfortunately, while the well-known GDT_TS metric provides a crude approximation of GDT_A, no algorithm currently exists that is capable of computing accurate estimates of GDT_A. METHODS: We prove that GDT_A is well defined and that it can be approximated by the Riemann sums, using available methods for computing accurate (near-optimal) GDT function values. RESULTS: In contrast to the GDT_TS metric, GDT_A is neither insensitive to large nor oversensitive to small changes in model's coordinates. Moreover, the problem of computing GDT_A is tractable. More specifically, GDT_A can be computed in cubic asymptotic time in the size of the protein model. CONCLUSIONS: This paper presents the first algorithm capable of computing the near-optimal estimates of the area under the GDT function for a protein model. We believe that the techniques implemented in our algorithm will pave ways for the development of more practical and reliable procedures for estimating 3D model quality.

11.
Artigo em Inglês | MEDLINE | ID: mdl-23702560

RESUMO

The Largest Common Point-set (LCP) and the Pattern Matching (PM) problems have received much attention in the fields of pattern matching, computer vision and computational biology. Perhaps, the most important application of these problems is the protein structural alignment, which seeks to find a superposition of a pair of input proteins that maximizes a given protein structure similarity metric. Although it has been shown that LCP and PM are both tractable problems, the running times of existing algorithms are high-degree polynomials. Here, we present novel methods for finding approximate and exact threshold-LCP and threshold-PM for r-separated sets, in general, and protein 3D structures, in particular. Improved running times of our methods are achieved by building upon several different, previously published techniques.


Assuntos
Algoritmos , Biologia Computacional/métodos , Reconhecimento Automatizado de Padrão/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Conformação Proteica , Proteínas/química
12.
Biomed Res Int ; 2013: 459248, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23509725

RESUMO

The importance of pairwise protein structural comparison in biomedical research is fueling the search for algorithms capable of finding more accurate structural match of two input proteins in a timely manner. In recent years, we have witnessed rapid advances in the development of methods for approximate and optimal solutions to the protein structure matching problem. Albeit slow, these methods can be extremely useful in assessing the accuracy of more efficient, heuristic algorithms. We utilize a recently developed approximation algorithm for protein structure matching to demonstrate that a deep search of the protein superposition space leads to increased alignment accuracy with respect to many well-established measures of alignment quality. The results of our study suggest that a large and important part of the protein superposition space remains unexplored by current techniques for protein structure alignment.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Algoritmos , Animais , Proteínas de Bactérias/química , Bases de Dados de Proteínas , Humanos , Reprodutibilidade dos Testes , Análise de Sequência de Proteína/métodos , Software
13.
Artigo em Inglês | MEDLINE | ID: mdl-22025757

RESUMO

We study the well known LCP (Largest Common Point-Set) under Bottleneck Distance Problem. Given two proteins a and b (as sequences of points in 3D space) and a distance cutoff σ, the goal is to find a spatial superposition and an alignment that maximizes the number of pairs of points from a and b that can be fit under the distance σ from each other. The best to date algorithms for approximate and exact solution to this problem run in time O(n^8) and O(n^32), respectively, where n represents the protein length. This work improves the runtime of the approximation algorithm and the algorithm for absolute optimum for both order-dependent and order-independent alignments. More specifically, our algorithms for near-optimal and optimal sequential alignments run in time O(^7 log n) and O(n^14 log n), respectively. For non-sequential alignments, corresponding running times are O(n^7.5) and O(n^14.5).


Assuntos
Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Algoritmos
14.
Artigo em Inglês | MEDLINE | ID: mdl-21904019

RESUMO

Protein structure alignment is an important tool in many biological applications, such as protein evolution studies, protein structure modeling, and structure-based, computer-aided drug design. Protein structure alignment is also one of the most challenging problems in computational molecular biology, due to an infinite number of possible spatial orientations of any two protein structures. We study one of the most commonly used measures of pairwise protein structure similarity, defined as the number of pairs of atoms in two proteins that can be superimposed under a predefined distance cutoff. We prove that the expected running time of a recently published algorithm for optimizing this (and some other, derived measures of protein structure similarity) is polynomial.


Assuntos
Algoritmos , Proteínas/química , Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica
15.
J Bioinform Comput Biol ; 9(3): 367-82, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21714130

RESUMO

The problem of finding an optimal structural alignment for a pair of superimposed proteins is often amenable to the Smith-Waterman dynamic programming algorithm, which runs in time proportional to the product of lengths of the sequences being aligned. While the quadratic running time is acceptable for computing a single alignment of two fixed protein structures, the time complexity becomes a bottleneck when running the Smith-Waterman routine multiple times in order to find a globally optimal superposition and alignment of the input proteins. We present a subquadratic running time algorithm capable of computing an alignment that optimizes one of the most widely used measures of protein structure similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. The algorithm presented in this article can be used to significantly improve the speed-accuracy tradeoff in a number of popular protein structure alignment methods.


Assuntos
Algoritmos , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de Proteína/estatística & dados numéricos , Sequência de Aminoácidos , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Dados de Sequência Molecular , Proteínas/química , Proteínas/genética , Software , Homologia Estrutural de Proteína
16.
Bioinformatics ; 25(21): 2751-6, 2009 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-19734152

RESUMO

MOTIVATION: Structural alignment is an important tool for understanding the evolutionary relationships between proteins. However, finding the best pairwise structural alignment is difficult, due to the infinite number of possible superpositions of two structures. Unlike the sequence alignment problem, which has a polynomial time solution, the structural alignment problem has not been even classified as solvable. RESULTS: We study one of the most widely used measures of protein structural similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. We prove that, for any two proteins, this measure can be optimized for all but finitely many distance cutoffs. Our method leads to a series of algorithms for optimizing other structure similarity measures, including the measures commonly used in protein structure prediction experiments. We also present a polynomial time algorithm for finding a near-optimal superposition of two proteins. Aside from having a relatively low cost, the algorithm for near-optimal solution returns a superposition of provable quality. In other words, the difference between the score of the returned superposition and the score of an optimal superposition can be explicitly computed and used to determine whether the returned superposition is, in fact, the best superposition. CONTACT: poleksic@cs.uni.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Bases de Dados de Proteínas , Reconhecimento Automatizado de Padrão , Análise de Sequência de Proteína
17.
BMC Bioinformatics ; 10: 112, 2009 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-19379500

RESUMO

BACKGROUND: In the last decade, a significant improvement in detecting remote similarity between protein sequences has been made by utilizing alignment profiles in place of amino-acid strings. Unfortunately, no analytical theory is available for estimating the significance of a gapped alignment of two profiles. Many experiments suggest that the distribution of local profile-profile alignment scores is of the Gumbel form. However, estimating distribution parameters by random simulations turns out to be computationally very expensive. RESULTS: We demonstrate that the background distribution of profile-profile alignment scores heavily depends on profiles' composition and thus the distribution parameters must be estimated independently, for each pair of profiles of interest. We also show that accurate estimates of statistical parameters can be obtained using the "island statistics" for profile-profile alignments. CONCLUSION: The island statistics can be generalized to profile-profile alignments to provide an efficient method for the alignment score normalization. Since multiple island scores can be extracted from a single comparison of two profiles, the island method has a clear speed advantage over the direct shuffling method for comparable accuracy in parameter estimates.


Assuntos
Biologia Computacional/métodos , Modelos Estatísticos , Alinhamento de Sequência/métodos , Proteínas/química , Análise de Sequência de Proteína/métodos
18.
J Bioinform Comput Biol ; 6(2): 335-45, 2008 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-18464326

RESUMO

Measuring the accuracy of protein three-dimensional structures is one of the most important problems in protein structure prediction. For structure-based drug design, the accuracy of the binding site is far more important than the accuracy of any other region of the protein. We have developed an automated method for assessing the quality of a protein model by focusing on the set of residues in the small molecule binding site. Small molecule binding sites typically involve multiple regions of the protein coming together in space, and their accuracy has been observed to be sensitive to even small alignment errors. In addition, ligand binding sites contain the critical information required for drug design, making their accuracy particularly important. We analyzed the accuracy of the binding sites on two sets of protein models: the predictions submitted by the top-performing CASP7 groups, and the models generated by four widely used homology modeling packages. The results of our CASP7 analysis significantly differ from the previous findings, implying that the binding site measure does not correlate with the traditional model quality measures used in the structure prediction benchmarks. For the modeling programs, the resolution of binding sites is extremely sensitive to the degree of sequence homology between the query and the template, even when the most accurate alignments are used in the homology modeling process.


Assuntos
Conformação Proteica , Proteínas/química , Animais , Biologia Computacional , Bases de Dados de Proteínas , Humanos , Modelos Moleculares
19.
Bioinformatics ; 24(9): 1145-53, 2008 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-18337259

RESUMO

MOTIVATION: Profile-based protein homology detection algorithms are valuable tools in genome annotation and protein classification. By utilizing information present in the sequences of homologous proteins, profile-based methods are often able to detect extremely weak relationships between protein sequences, as evidenced by the large-scale benchmarking experiments such as CASP and LiveBench. RESULTS: We study the relationship between the sensitivity of a profile-profile method and the size of the sequence profile, which is defined as the average number of different residue types observed at the profile's positions. We also demonstrate that improvements in the sensitivity of a profile-profile method can be made by incorporating a profile-dependent scoring scheme, such as position-specific background frequencies. The techniques presented in this article are implemented in an alignment algorithm UNI-FOLD. When tested against other well-established methods for fold recognition, UNI-FOLD shows increased sensitivity and specificity in detecting remote relationships between protein sequences. AVAILABILITY: UNI-FOLD web server can be accessed at http://blackhawk.cs.uni.edu


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Sequência de Aminoácidos , Dados de Sequência Molecular , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
20.
Proteins ; 65(4): 953-8, 2006 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-17006949

RESUMO

We present a novel, knowledge-based method for the side-chain addition step in protein structure modeling. The foundation of the method is a conditional probability equation, which specifies the probability that a side-chain will occupy a specific rotamer state, given a set of evidence about the rotamer states adopted by the side-chains at aligned positions in structurally homologous crystal structures. We demonstrate that our method increases the accuracy of homology model side-chain addition when compared with the widely employed practice of preserving the side-chain conformation from the homology template to the target at conserved residue positions. Furthermore, we demonstrate that our method accurately estimates the probability that the correct rotamer state has been selected. This interesting result implies that our method can be used to understand the reliability of each and every side-chain in a protein homology model.


Assuntos
Modelos Moleculares , Proteínas/química , Alinhamento de Sequência/métodos , Homologia Estrutural de Proteína , Sequência de Aminoácidos , Simulação por Computador , Bases de Dados de Proteínas , Conformação Proteica , Homologia de Sequência de Aminoácidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...